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Abstract. In this paper, we consider the singular values and singular vectors of finite, 
low rank perturbations of large rectangular random matrices. Specifically, we prove 
almost sure convergence of the extreme singular values and appropriate projections of 
the corresponding singular vectors of the perturbed matrix. 

As in the prequel, where we considered the eigenvalues of Hermitian matrices, the 
non-random limiting value is shown to depend explicitly on the limiting singular value 
distribution of the unperturbed matrix via an integral transform that linearizes rectan- 
gular additive convolution in free probability theory. The asymptotic position of the 
extreme singular values of the perturbed matrix differs from that of the original matrix 
if and only if the singular values of the perturbing matrix are above a certain critical 
threshold which depends on this same aforementioned integral transform. 

We examine the consequence of this singular value phase transition on the associated 
left and right singular eigenvectors and discuss the fluctuations around these non-random 
limits. 



1. Introduction 

In many applications, the n x m signal-plus-noise data or measurement matrix formed 
by stacking the m samples or measurements of n x 1 observation vectors alongside each 
other can be modeled as: 

r 

X = Y,(^iU^V* +X, (1) 

1=1 

where Ui and Vi are left and right "signal" column vectors, cxj are the associated "signal" 
values and X is the noise-only matrix of random noises. This model is ubiquitous in signal 
processing [SSI SSj, statistics 121 [3S] and machine learning [H7] and is known under 
various guises as a signal subspace model [50], a latent variable statistical model [39], or 
a probabilistic PCA model [52]. 

Relative to this model, a common application-driven objective is to estimate the sig- 
nal subspaces Span{Mi, . . . ,Ur} and Spanjwi, . . . ,Vr} that contain signal energy. This is 
accomplished by computing the singular value decomposition (in brief SVD) of X and 
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extracting the r largest singular values and the associated singular vectors of X - these 
are referred to as the r principal components [H] and the Eckart-Young-Mirsky theorem 
states that they provide the best rank-r approximation of the matrix X for any unitar- 
ily invariant norm [261 E]- This theoretical justification combined with the fact that 
these vectors can be efficiently computed using now-standard numerical algorithms for 
the SVD [21] has led to the ubiquity of the SVD in applications such as array processing 
|53] . genomics [T], wireless communications [25]) information retrieval [2H] to list a few 
[38]. 



In this paper, motivated by emerging high- dimensional statistical applications [3S] , we 
place ourselves in the setting where n and m are large and the SVD of X is used to form 
estimates of {di}, and {vi}^^^. We provide a characterization of the relationship 

between the estimated extreme singular values of X and the true "signal" singular values 
CTj (and also the angle between the estimated and true singular vectors). 

In the limit of large matrices, the extreme singular values only depend on integral 
transforms of the distribution of the singular values of the noise-only matrix X in ([1]) and 
exhibit a phase transition about a critical value: this is a new occurrence of the so-called 
BBP phase transition, named after the authors of the seminal paper [8|. The critical value 
also depends on the aforementioned integral transforms which arise from rectangular free 
probability theory [TTl [T2] . We also characterize the fluctuations of the singular values 
about these asymptotic limit. The results obtained are precise in the large matrix limit 
and, akin to our results in [T7], go beyond answers that might be obtained using matrix 
perturbation theory 



Our results are in a certain sense very general (in terms of possible distributions for 
the noise model X) and recover as a special case results found in the literature for the 
eigenvalues [H [9| and eigenvectors [SH HHl IH] of XX* in the setting where X in ([1]) is 
Gaussian. For the Gaussian setting we provide new results for the right singular vectors. 
Such results had already been proved in the particular case where X is a Gaussian matrix, 
but our approach brings to light a general principle, which can be applied beyond the 
Gaussian case. Roughly speaking, this principle says that for X a. n x p matrix (with 
n,p ^ 1), ii one adds an independent small rank perturbation Yll=i '^i'^i'^'i to X, then 
the extreme singular values will move to positions which are approximately the solutions 
z of the equations 

n^' z^I-XX* ''p^'z^I-X*X = ef' 



In the case where these equations have no solutions (which means that the 6'j's are below 
a certain threshold), then the extreme singular values of X will not move signiflcantly. 
We also provide similar results for the associated left and right singular vectors and give 
limit theorems for the fluctuations. These expressions provide the basis for the parameter 
estimation algorithm developed by Hachem et al in [3Uj . 

The papers [171 IE] were devoted to the analogue problem for the eigenvalues of flnite 
rank perturbations of Hermitian matrices. We follow the strategy developed in these 
papers for our proofs: we derive master equation representations that implicitly encode 
the relationship between the singular values and singular vectors of X and X and use 
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concentration results to obtain the stated analytical expressions. Of course, because of 
these similarities in the proofs, we chose to focus, in the present paper, in what differs 
from [mils]. 

At a certain level, our proof also present analogies with the ones of other papers devoted 
to other occurrences of the BBP phase transition, such as |171 ETJ EH |22l [23]. We mention 
that the approach of the paper [16] could also be used to consider large deviations of the 
extreme singular values of X. 

This paper is organized as follows. We state our main results in Section [2] and provide 
some examples in Section [31 The proofs are provided in Sections [IE with some technical 
details relegated to the appendix in Section [H] 

2. Main results 

2.1. Definitions and hypotheses. Let X„ be a x m real or complex random matrix. 
Throughout this paper we assume that n < m so that we may simplify the exposition 
of the proofs. We may do so without loss of generality because in the setting where 
n > m, the expressions derived will hold for X*. Let the n < m singular values of Xn 
be 0"! > (72 > . . . > an- Let fix„ be the empirical singular value distribution, i.e., the 
probability measure defined as 

1 " 

i=l 

Let m depend on n - we denote this dependence explicitly by m„ which we will sometimes 
omit for brevity by substituting m for m„. Assume that as n — > oo, n/nin — > c G [0, 1]. 
In the following, we shall need some of the following hypotheses. 

Assumption 2.1. The probability measure fiXn converges almost surely weakly to a non- 
random compactly supported probability measure fix- 

Examples of random matrices satisfying this hypothesis can be found in e.g. [3 [13 
M HB El [IS] . Note however that the question of isolated extreme singular values is not 
addressed in papers like [5l [15] (where moreover the perturbation has a non bounded 
rank) . 

Assumption 2.2. Let a be infimum of the support of fix- The smallest singular value of 
Xn converges almost surely to a. 

Assumption 2.3. Let b be supremum of the support of fix- The largest singular value of 
Xn converges almost surely to b- 

Examples of random matrices satisfying the above hypotheses can be found in e.g. 

In this problem, we shall consider the extreme singular values and the associated sin- 
gular vectors of X„, which is the random n x m matrix: 

where P„ is defined as described below. 
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For a given r > 1, let > ■ ■ ■ > > be deterministic non-zero real numbers, chosen 
independently of n. For every n, let G'i"'' , Cl"^ be two independent matrices with sizes 
respectively nxr and mxr, with i.i.d. entries distributed according to a fixed probability 
measure z/ on K = M or C. We introduce the column vectors Ui,...,Ur G K"^^ and 
f 1, . . . , € K™^^ obtained from gI"^ and by either: 

(1) Setting Ui and Vi to equal the i-th column of -^G'i"'' and respectively or, 

(2) Setting Ui and Vi to equal to the vectors obtained from a Gram-Schmidt (or QR 
factorization) of and G^^ respectively. 

We shall refer to the model (1) as the i.i.d. model and to the model (2) as the orthonormal- 
ized model. With the Wj's and Vi^s constructed as above, we define the random perturbing 
matrix P„ G K"^™ as: 

r 
1=1 

In the orthonormalized model, the 6'i's are the non zero singular values of P„ and the Wj's 
and the f j's are the left and right associated singular vectors. 

We make the following hypothesis on the law u of the entries of G^u^ and G^^ (see [31 
Sect. 2.3.2] for the definition of log-Sobolev inequalities). 

Assumption 2.4. The probability measure v has mean zero, variance one and that sat- 
isfies a log-Sobolev inequality. 

Remark 2.5. We also note if u is the standard real or complex Gaussian distribution, 
then the singular vectors produced using the orthonormalized model will have uniform 
distribution on the set of r orthogonal random vectors. 

Remark 2.6. If Xn is random but has a bi-unitarily invariant distribution and P„ is 
non-random with rank r, then we are in same setting as the orthonormalized model for 
the results that follow. More generally, our idea in defining both of our models (the 

1.1. d. one and the orthonormalized one) was to show that if P„ is chosen independently 
from Xn in a somehow "isotropic way" (i.e. via a distribution which is not faraway from 
being invariant by the action of the orthogonal group by conjugation), then a BBP phase 
transition occurs, which is governed by a certain integral transform of the limit empirical 
singular values distribution of X„, namely fix- 

Remark 2.7. We note that there is small albeit non-zero probability that r i.i.d. copies 
of a random vector are not linearly independent. Consequently, there is a small albeit 
non-zero probability that the r vectors obtained as in (2) via the Gram-Schmidt orthog- 
onalization may not be well defined. However, in the limit of large matrices, this process 
produces well-defined vectors with overwhelming probability (indeed, by Proposition 18. 2^ 
the determinant of the associated r x r Gram matrix tends to one). This is implicitly 
assumed in what follows. 

2.2. Notation. Throughout this paper, for / a function and (i G M, we set 



f{d+):=\\mf{z); /(d") := lim /(z). 
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we also let denote almost sure convergence. The (ordered) singular values of an n x m 
Hermitian matrix M will be denoted by o"i(M) > ■ ■ ■ > cr„(M). Lastly, for a subspace F 
of a Euclidian space E and a unit vector x G -E, we denote the norm of the orthogonal 
projection of x onto F by (x, F). 

2.3. Largest singular values and singular vectors phase transition. In Theorems 
ESI [M] and [2A01 we suppose Assumptions EH and [221 to hold. 

We define 6', the threshold of the phase transition, by the formula 

-6:= {D,,{h^))-'/\ 

with the convention that (+oo)^^/^ = 0, and where D^^^ the D-transform of Hx is the 
function, depending on c, defined by 



t2 



d^,x{t) 







X 









t2 



1 



for z > h. 



In the theorems below, D^^{-) will denote its functional inverse on [b, +oo). 

Theorem 2.8 (Largest singular value phase transition). The r largest singular values 
of the n X m perturbed matrix exhibit the following behavior as n,mn — ?■ oo and 
n/rrtn — ?■ c. We have that for each fixed 1 < i < r, 



o"i(X„) 



otherwise. 

a.s. 7 

— )■ b. 



Moreover, for each fixed i > r, we have that ai{Xn) 

Theorem 2.9 (Norm of projection of largest singular vectors). Consider indices iq G 
{1, . . . , r} such that 6^^ > 9 . For each n, define = aig{Xn) and let u and v be left and 
right unit singular vectors of X„ associated with the singular value . Then we have, as 
n — )■ oo. 



a) 



b) 



(^i,SpanK s.t. = e,,})\' ^ ^2%44: 



{v, Span{^;j s.t. O.-, = 6^^})\ 



2 a.s. 



where p = D^]^{l/6f^) is the limit of and px = cpx + (1 — c)6o and for any 
probability measure p, 

c) Furthermore, in the same asymptotic limit, we have 

Span{ui s.t. 6i ^ 6'iJ)|^ 0, and \{v,Span{vi s.t. 6i ^ 6'iJ)|^ 

and 

{iPf,^{p)PnV -u , Span{Mi s.t. 9i = 6'iJ) 0. 



(2) 



(3) 



(4) 
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Theorem 2.10 (Largest singular vector phase transition). When r = 1, let the sole 
singular value of Pn be denoted by 9. Suppose that 

e<e and ip'^^{b+) = ~oo. (5) 

For each n, let u and v denote, respectively, left and right unit singular vectors of 
associated with its largest singular value. Then 

{u, keiie^In - PnP*n)) ^ 0, and {v, kei{e'l^ - P:Pn)) ^ 0, 
as n — )■ oo. 

The foUowing proposition aUows to assert that in many classical matrix models, the 
threshold 9 of the above phase transitions is positive. The proof relies on a straightforward 
computation which we omit. 

Proposition 2.11 (Edge density decay condition for phase transition). Assume that the 
limiting singular distribution nx has a density f^^ with a power decay at b, i.e., that, as 
t ^ b with t < b, f^xi^) ~ (6 — t)" for some exponent a > —1 and some constant M . 
Then: 

e= (L)^^(6+))-i/2 > ^ a > and <^'^^(fe^) = -oo ^ a < 1, 
so that the phase transitions in Theorems \2.8\ and \2.1(^ manifest for a = 1/2. 

Remark 2.12 (Necessity of singular value repulsion for the singular vector phase tran- 
sition). Under additional hypotheses on the manner in which the empirical singular dis- 
tribution of Xn — — 7- fJ'X as n — > oo, Theorem 12.101 can be generalized to any singular 
value with limit b such that D'^^{p) is infinite. The specific hypothesis has to do with 
requiring the spacings between the singular values of X^ to be more "random matrix like" 
and exhibit repulsion instead of being "independent sample like" with possible clumping. 
We plan to develop this line of inquiry in a separate paper. 

2.4. Smallest singular values and vectors for square matrices. We now consider 
the phase transition exhibited by the smallest singular values and vectors. We restrict 
ourselves to the setting where Xn is a square matrix; this restriction is necessary because 
the non-monotonicity of the function D^^ '^^ [0, a) when c = lim n/m < 1, poses some 
technical difficulties that do not arise in the square setting. Moreover, in Theorems 12. 13^ 
12.141 and I2.15[ we suppose Assumptions 12. H 12.21 and 12.41 to hold. 

We define 6_, the threshold of the phase transition, by the formula 

with the convention that (-l-oo)"-'^ = 0, and where (^^^(z) = f ^2^^2 dju(t), as in Equation 
(jl]). In the theorems below, V'^_^(-) will denote its functional inverse of the function v'^x(') 
on (0, a). 

Theorem 2.13 (Smallest singular value phase transition for square matrices). When 
a > and m = n, the r smallest singular values of Xn exhibit the following behavior. We 
have that for each fixed 1 < i < r, 

r^;i(lM) zfe,>9, 
I a otherwise. 
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Moreover, for each fixed i > r, we have that cr„+i_j(X„) a. 

Theorem 2.14 (Norm of projection of smallest singular vector for square matrices). 
Consider indices iq G {1, . . . ,r} such that 6ig > 9_. For each n, define a^-, = an+i-io{X„) 
and let u and v be left and right unit singular vectors of Xn associated with the singular 
value CiQ. Then we have, as n — > oo, 

a) 

\{u,Span{ui s.t. Oi = 6'iJ)p ^ , , (6) 

b) 

\{v,Span{vi s.t. Oi = 9i^})\'^ ^ (7) 

c) Furthermore, in the same asymptotic limit, we have 

Span{uj s.t. 9i 7^ 9if,})\^ 0, and \{v,Spaia{vi s.t. 9i 6'iJ)p 0, 

and 

{Lp^^{p)PnV - u , Span{Mj s.t. 9i = 6'iJ) 0, 

Theorem 2.15 (Smallest singular vector phase transition). When r = 1 and m = n, let 

the smallest singular value of Xn be denoted by an with u and v representing associated 
left and right unit singular vectors respectively. Suppose that 

a > 0, 9<9 and (p'^^{a^) = —00. 

Then 

{u, keT{9^In - PnP:)) ^ 0, and {v, keT{9'lm - P:Pn)) ^ 0, 
as n — 00. 

The analogue of Remark 12.121 also applies here. 

2.5. The /^-transform in free probability theory. The C -transform with ratio c of 
a probability measure on IR+, defined as: 

C,{z) = U{z{D-^\z)f-l), (8) 

where the function f/, defined as: 

U{z) = \ 2c — when c > 0, 

yz when c = 0, 

is the analogue of the logarithm of the Fourier transform for the rectangular free con- 
volution with ratio c (see [T51 [T3j for an introduction to the theory of rectangular free 
convolution) in the sense described next. 

Let An and i?„ be independent n x m rectangular random matrices that are invariant, 
in law, by conjugation by any orthogonal (or unitary) matrix. Suppose that, as ?7,,m — > 
00 with n/m — c, the empirical singular values distributions fiA„ and ij,b„ of An and 
Bn satisfy fiA„ — > f^A and fiB„ — ^ fJ'B- Then by [TT], the empirical singular values 
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distribution ij.a„+b„ of An + Bn satisfies /U^„+b„ — > jJ-A He I^b, where /j^a He /^b is a 
probabihty measure which can be characterized in terms of the C-transform as 

The coefficients of the series expansion of U{z) are the rectangular free cumulants with 
ratio c of fi (see [12] for an introduction to the rectangular free cumulants). The connec- 
tion between free rectangular additive convolution and (via the C-transform) and the 
appearance of in Theorem 12.81 could be of independent interest to free probabilists: 
the emergence of this transform in the study of isolated singular values completes the pic- 
ture of [17] , where the transforms linearizing additive and multiplicative free convolutions 
already appeared in similar contexts. 

2.6. Fluctuations of the largest singular value. Assume that the empirical singular 
value distribution of X„ converges to fix faster than l/i/n. More precisely, 



— = c + o(^^ 



Assumption 2.16. We have 



r = 1, 6 := 6i > 6 and 

^TY(//„ - Xnx:r' = I -^^.d^t) + o(-^) 

for p = D^lil/e"^) the limit ofai{Xn). 

We also make the following hypothesis on the law u (note that it doesn't contains the 
fact that u is symmetric). In fact, wouldn't it hold, we would still have a limit theorem 
on the fluctuations of the largest singular value, like in Theorem 3.4 of [131, but we chose 
not to develop this case. 

Assumption 2.17. If u is entirely supported by the real line, J a:^dz/(x) = 3. If u is not 

entirely supported by the real line, the real and imaginary parts of a u- distributed random 
variables are independent and identically distributed with J |z|^dz/(z) = 2. 

Note that we do not ask u to be symmetric and make no hypothesis about its third 
moment. The reason is that the main ingredient of the following theorem is Theorem 
6.4 of [15] (or Theorem 7.1 of [7]), where no hypothesis of symmetry or about the third 
moment is done. 



Theorem 2.18. Suppose Assumptions\2.1\\2/^\2.4\\2.1b\and\2.11l\to hold. Let oi denote 



the largest singular value of Xn- Then as n — > oo. 



(5i-p) AAr(0,s2) 



where p = D ^{c^l/ 6"^) and 



MX ' 

2 



2/3 
2/3 



for the i.i.d. model, 

for the orthonormalized model. 
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with /3 = 1 (or 2) when X is real (or complex) and 

J (o^-fi)^ J 



(p2-t2)2 
p2-t2 J 



I 



p2_t2 



+ 2 



J p'2~fi J p2„(2 



with fix = c/ix + (1 — c)6o. 



2.7. Fluctuations of the smallest singular value of square matrices. When 

n so that c = 1, assume that: 

Assumption 2.19. For all n, = n, r = 1, 9 := 9i > 9_ and 



n 



Tr(p^/„-X„X: 



<ifix[t) + o 



/or p := (/?^^(l/6') i/ie limit of the smallest singular value of X^. 

Theorem 2.20. Suppose Assumptions \2.1\ \2.S\ \2.4\ \2.1(A and \2.1'l\ to hold. Let an denote 
the smallest singular value of X^. Then as n — )■ oo 



n 



where 



( f2 



f 

—— for the i.i.d. model 

2(5 ^ 



I 2/3 



for the orthonormalized model 



with (5 = 1 (or 2) when X is real (or complex) and := 26"^ J ^^i^dpx(^) 



3. Examples 



3.1. Gaussian rectangular random matrices with non-zero mean. Let Xn be an 

nxm real (or complex) matrix with independent, zero mean, normally distributed entries 
with variance 1/m. It is known [101 E] that, as n,m — > oo with n/m c G (0, 1], the 
spectral measure of the singular values of X^ converges to the distribution with density 



a;2 - 1 - c 



TTCX 



-l{a,b){x)dx, 



where a = 1 — a/c and 6=1 + -y/c are the end points of the support of px- It is known 
[H] that the extreme eigenvalues converge to the bounds of this support. 

Associated with this singular measure, we have, by an application of the result in [TUl 
Sect. 4.1] and Equation ([S]), 



(z+l)(cz+l) 



Dpxi^) 



2c 



-4c 



D,xm = j, 
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Thus for any nxm deterministic matrix P„ with r non-zero singular values Oi > ■ ■ ■ > 9r 
(r independent of n,m), for any fixed i > 1, by Theorem 12. 8[ we have 



if z < r and 9i > c^^^ 



a,(X„ + P„) ^ <( V r " " - ■ - - (9) 

1 + a/c otherwise. 

as n — > oo. As far as the i.i.d. model is concerned, this formula allows us to recover 
some of the results of [S] . 

Now, let us turn our attention to the singular vectors. In the setting where r = 1, let 
Pn = Ouv* . Then, by Theorems 12.91 and 12. 10^ we have 

\{u,u)\'^r e\e^ + c) ' (10) 

[^0 otherwise. 

The phase transitions for the eigenvectors of or for the pairs of singular vectors 

of Xn can be similarly computed to yield the expression: 



iv.v)? ^ r e\e^ + i) ' (11) 

otherwise. 

3.2. Square Haar unitary matrices. Let Xn be Haar distributed unitary (or orthog- 
onal) random matrix. All of its singular values are equal to one, so that it has limiting 
spectral measure 

^x[x) = 5i, 

with a = b = 1 being the end points of the support of ^x- 

Associated with this spectral measure, we have (of course, c = 1) 

thus for all 6' > 0, 

D~^{l/9'^) I ^^^2^^ ' ^1 inverse is computed on (1, +oo), 
1 if tiie inverse is computed on (0, 1). 

Thus for any n rank r perturbing matrix P„ with r non-zero singular values 

6*1 > ■ ■ ■ > where neither r, nor the ftj's depend on ra, for any fixed i = 1, . . . , r, by 
Theorem 12.81 we have 



1- Ji^ if^>cv^ 



a,{Xn + Pn) — > ^ and o-„+i„i(X„ + P„) — > ^ 

while for any fixed i > r + 1, both crj(X„ + P„) and cr„+i_j(X„ + P„) 1. 
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4. Proof of Theorems 12.81 and 12.131 

The proofs of both theorems are quite similar. As a consequence, we only prove Theo- 
rem [231 

The sequence of steps described below yields the desired proof (which is very close to 
the one of Theorem 2.1 of |17]): 

(1) The first, rather trivial, step in the proof of Theorem l2.8l is to use Weyl's interlacing 
inequalities to prove that any fixed-rank singular value of X„ which does not tend 
to a limit > b tends to b. ^ 

(2) Then, we utilize Lemma 14.11 below to express the extreme singular values of 
as the 2;'s such that a certain random 2r x 2r matrix Mjyz) is singular. 

(3) We then exploit convergence properties of certain analytical functions (derived in 
the appendix) to prove that almost surely, Mni^z) converges to a certain determin- 
istic matrix M(^z\ uniformly in z. 

(4) We then invoke a continuity lemma (see Lemma [8]T] in the appendix) to claim that 
almost surely, the z's such that M„,(2;) is singular [i.e. the extreme singular values 
of converge to the 2;'s such that M{z^ is singular. 

(5) We conclude the proof by noting that, for our setting, the 2;'s such that M{z) is 
singular are precisely the 2;'s such that for some i G {1, . . . , r}, D^^{z) = 4-. Part 

i 

(ii) of Lemma IS. II , about the rank of Mn{z), will be useful to assert that when 
the 6'j's are pairwise distinct, the multiplicities of the isolated singular values are 
all equal to one. 

Firstly, up to a conditioning by the cr-algebra generated by the X^s, one can suppose 
them to be deterministic and all the randomness supported by the perturbing matrix P„. 

Secondly, by |33l Th. 3.1.2], one has, for all i > 1, 

with the convention aj{Xn) = +oo for i < and for i > n. By the same proof as in [TTl 
Sect. 6.2.1], it follows that for alH > 1 fixed, 

liminf ai(X„) > 6 (12) 

and that for all fixed i > r, 

a,(X„) — ^ b (13) 

(we insist here on the fact that i has to be fixed, i.e. not to depend on n: of course, for 
i = n/2, (fT3|) is not true anymore in general). 

Our approach is based on the following lemma, which reduces the problem to the study 
of 2r X 2r random matrices. Recall that the constants r, 6i,...,6r, and the random 
column vectors (which depend on n, even though this dependence does not appear in the 
notation) Ui, . . . ,Vr, Vi, . . . ,Vr have been introduced in Section [2TT] and that the perturbing 
matrix P„ is given by 

r 

Pn = y^^OjUiV*. 
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Recall also that the singular values of Xn are denoted by ui > ■ ■ ■ > cr„. Let us define 
the matrices 

e = diag(^i,...,^^) e M^^", = [ui ■ ■ ■ n,] e K"^^ = [vi ■ ■ ■ u,] ^ W^""' . 

Lemma 4.1. The positive singular values of which are not singular values of X^ are 
the z ^ {(Ti, . . . , cr„} such that the 2r x 2r matrix 



MJz) :-- 



is not invertible. 



■ e-i 




For the sake of completeness, we provide a proof, even though several related results 
can be found in the literature (see e.g. [H [IS]). 

Proof. Firstly, [521 Th. 7.3.7] states that the non-zero singular values of X„ are the 

X„ 



positive eigenvalues of 
of X„, by da Lem. 6.1 

X„ 



.K 



Secondly, for any z > which is not a singular value 



det zin 



x: 



det I zIn — 



Xn 
XI 



-1 r 
X 

1=1 



X detM„(z), 



which allows to conclude, since by hypothesis, det ( zl^. 



X„ 
XI 



^0. 



□ 



Note that by Assumption 12. ![ 

z 



iiv- 

n z'^ln — XnX* n^co 



Tr 



z 



m z'^I^ - X*Xn n^oo 



z'^-t'^ 
z 



dfixit), 

djlxit) (Jix = cfix + (1 - c)6o), 



z^-t^ 

uniformly on any subset of G C s.t. ^{z) > b + t]}, r] > 0. It follows, by a direct 
application of Ascoli's Theorem and Proposition 18. 2^ that almost surely, we have the 
following convergence (which is uniform in z) 



z^j - X X* 



Un 



Z 



- x*X 



Vrr. 



In the same way, almost surely 



Z2 -t2 

Z 

z^-t^ 

7-* v"* / ^2 ] 



dfixit) j ■ Ir, 
djl^{t) I ■ Ir. 



Kiz'ln - XrrX:)-'Xr.V„r — > and K:^:(^'/„ - X^X^'U^ — > 0. 

n—>-oo n—>-oo 



It follows that almost surely. 



Mrriz) 



M{z) :-- 



e-i' 
e-i 



(14) 



^f,^{z)Ir 

where ipf^^ and ipj^^ are the functions defined in the statement of Theorem 12.91 

Now, note that once fll2p has been established, our result only concerns the number of 
singular values of Xn in [b + rj, +00) (for any 77 > 0), hence can be proved via Lemma ISTTl 



LOW RANK PERTURBATIONS OF LARGE RANDOM MATRICES 



13 



Indeed, by Hypothesis I2.3[ for n large enough, Xn has no singular value > 6 + 77, thus 
numbers > b + f] cannot be in the same time singular values of Xn and Xn. 

In the case where the ^j's are pairwise distinct. Lemma [8.11 allows to conclude the proof 
of Theorem 12.81 Indeed, Lemma 18.11 says that exactly as much singular values of Xn 
as predicted by the theorem have limits > b and that their limits are exactly the ones 
predicted by the Theorem. The part of the theorem devoted to singular values tending 
to b can then be deduced from (IT^ and ( 1T5]) . 

In the case where the ^j's are not pairwise distinct, an approximation approach allows 
to conclude (proceed for example as in Section 6.2.3 of [T7], using |32l Cor. 7.3.8 (b)] 
instead of ^ Cor. 6.3.8]). 



5. Proof of Theorems 12.91 and 12.141 

The proofs of both theorems are quite similar. As a consequence, we only prove Theo- 
rem [531 

As above, up to a conditioning by the a- algebra generated by the XnS, one can suppose 
them to be deterministic and all the randomness supported by the perturbing matrix P„. 

Firstly, by the Law of Large Numbers, even in the i.i.d. model, the Uj's and the Vi^s are 
almost surely asymptotically orthonormalized. More specifically, for all i ^ j, 



n— >oo 

(the same being true for the Wj's). As a consequence, it is enough to prove that 



E \ru,u.)\^ (15) 



^0 MX 



b') 



i s.t. 6i,=e,v, *o vf^^ 



\|2 a.s 2(y9^ (pj 

^WJlTTTy (16) 



C) 



d') 



|(2,«.)P + K?^,t;.)P^0, (17) 

i s.t. Qii^Oif^ 



i s.t. 



Again, the proof is based on a lemma which reduces the problem to the study of the 
kernel of a random 2r x 2r matrix. The matrices 0, IJn and are the ones introduced 
before Lemma [4.11 
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Lemma 5.1. Let z be a singular value of Xn which is not a singular value of Xn and let 
[u, v) be a corresponding singular pair of unit vectors. Then the column vector 

QU*u 



belongs to the kernel of the 2r x 2r matrix Mn{z) introduced in Lemma 4^.1. Moreover, 
we have 

* 7-)* 7") I * 7") v^„v^n 

V P.„ ^^TT .^ PnV + U Pn- 



—P*u 
X*x V " 



+V*P, 



z 



Pr,V 



(19) 



Proof. The first part of the lemma is easy to verify with the formula X*/(X„X*) = 
/(X*X„)X* for any function / defined on [0, +oo). For the second part, use the formulas 



XnX*u = z u and X*u = zv — P*u, 



to establish u = {z'^In — ^n^n) ^{^PnV + XnPnU), and then use the fact that u*u = !.□ 



Let us consider Zn, {u, v) as in the statement of Theorem 12.91 Note firstly that for n 



large enough, Zn > cri(X„), hence Lemma ISTTl can be applied, and the vector 



evm*v 

QU*u 



[9i{vi,v), . . . ,9r{Vr,v),6i{ui,u), . . .,6r{Ur,u)] 



T 



(20) 



belongs to kerM„(2;„). As explained in the proof of Theorem 12.81 the random matrix- 
valued function M„(-) converges almost surely uniformly to the matrix- valued function 
M(-) introduced in Equation (IT^ . Hence Mn{zn) converges almost surely to M{p), and it 
follows that the orthogonal projection on (ker M(p))-'- of the vector of fl2I]]) tends almost 
surely to zero. 

Let us now compute this projection. For x, y column vectors of W, 



M{p) 








and Xi = Oiifiji^ {p)yi 



Xi = t/i = if ef'^^^{p)^^M 7^ 1' 
Vi = Oi(p^,j,{p)x, if ef(Pf,^{p)ipji^{p) = 1. 

Note that p is precisely defined by the relation 9f^(pf^^{p)(pji^{p) = 1. Hence with (3 :- 
-^io'ftixip)^ we have. 



ker M(p) = { 
hence 

(ker M(p)) 



e K^+^ s.t. Vi, 



Xi 



yi = Oif 9ij^ and yi = -f3xi if 9i = 9ig}, 



{ 



G s.t. Vz, 



Xi 



(3y, if^. = ^.o} 



and the orthogonal projection of any vector 
that for all i, 



[Xi,yi} 



on (ker M(p))-'- is the vector 



if 9, ^ 9, 



X 



such 



^(/3,1) if 9^ = 9,,. 
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Then, (|T7|) and (|T8|) are direct consequences of the fact that the projection of the vector 
of (1201) on (ker M(p))-'- tends to zero. 

Let us now prove flTB]) . By fITI?]) . we have 

an + bn + Cn + dn = 1, (21) 

with 

bn = ^*^n . = i: e,9,-^{u,, n)v* V, (23) 

r 



r 



Since the hmit of Zn is out of the support of /ix, one can apply Proposition 18.21 to assert 
that both Cn and dn have almost sure limit zero and that in the sums (122|) and (123|) . any 
term such that i 7^ j tends almost surely to zero. Moreover, by (fT7|) . these sums can also 
be reduced to the terms with index i such that 9i = Oi^. Tu sum up, we have 



j s.t. ei=9ig \ n m n n) 



Now, note that since Zn tends to p. 



1 4 /• 

n {Zlln - Xr.X*y ^ J (p2_t2)2d/^^(^)' 

hence by Proposition 18.21 almost surely, 

j s.t. e,=e,y 

= ^'oy (^2_^2)2 d/^^(^) 5Z l(w^>^^)r + 0(l). 

j s.t. 6i=6iQ 

Moreover, by (ITS!) . for all i such that = 9ig, 

\M\' = Ol{^,^Ap)r\{n.v)\' + o{l). 
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It follows that 

i s.t. di=diQ 

Since a„ + 6„ = 1 + o(l), we get 



The relations 



p2 1 

t2 _ 1 



allow to recover the RHS of (fT6i) easily. Via ( 1T8|) . one easily deduces (|T5l) . 

6. Proof of Theorems 12.101 and 12.151 

Again, we shall only prove Theorem 12.101 and suppose the X„'s to be non random. 

Let us consider the matrix Mn{z) introduced in Lemma [4.11 Here, r = 1, so one easily 
gets, for each n, 

lim detM„(z) = -Q-"^. 

Moreover, for bn '■= ai{Xn) the largest singular value of X„, looking carefully at the term 
in 2^1,2 in det Mn{z), it appears that with a probability which tends to one as n — > oo, 
we have 

lim detM„(2;) = +oo. 

z— >6„ 

It follows that with a probability which tends to one as n — )■ oo, the largest singular 
value ai of X„ is > 6„. 



Then, one concludes using the second part of Lemma 15. H as in the proof of Theorem 

2.3 of [n]. 

7. Proof of Theorems [2^81 and [QOl 

We shall only prove Theorem 12.181 because Theorem 12.201 can be proved similarly. 
We have supposed that r = 1. Let us denote u = ui and v = vi. Then we have 

Pn = OUV*, 

with u G K"^^, V G K"*^^ random vectors whose entries are z/-distributed independent 
random variables, renormalized in the orthonormalized model, and divided by respectively 
^/n and y/m in the i.i.d.. model. We also have that the matrix Mn{z) defined in Lemma 
14. H is a 2 X 2 matrix. 
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Let us fix an arbitrary b* such that b < b* < p. Theorem 12.81 implies that almost surely, 
for n large enough, det[M„(-)] vanishes exactly once in {b*,oo). Since moreover, almost 
surely, for all n, 

lim det[M„(z)] = < 0, 

z— s>+oo (7 

we deduce that almost surely, for n large enough, det[M„(2;)] > for 6* < 2 < ai and 
det[M„(2)] < for ai < z. 

As a consequence, for any real number x, for n large enough. 



y/n{ai - p) < X 



det M„ ( p + ^ ) > 0. 



(24) 



Therefore, we have to understand the limit distributions of the entries of M„ (^p + ^ j . 



They are given by the following 
Lemma 7.1. For any fixed real number x, as n 

converges weakly to the one of 



— )■ 00, the distribution of 



X 













'ciX 


dZ' 


+ 


dZ 


C2Y_ 



for Y, Z (resp. X, Y, ^{Z), '^{Z) ) independent standard real Gaussian variables if f3 = 1 
(resp. if (3 = 2) and for ci, C2, d some real constants given by the following formulas: 



„2 



(p2-t2)2 
n2 



dpx{t) 



13 J (p2_<2)2 
„2 



dpx{t) 

If t^ 

;dpx{t). 



in the i.i.d. model, 

in the orthonormalized model, 

in the i.i.d. model, 

in the orthonormalized model. 



d^ 

P J (p2 _ ^2)2 

Proof. Let us define Zn '■= p + We have 



(25) 

(26) 
(27) 



in 



V*X*J^zJ.n - XnX*) 



U 



-j^U*{zlln - XnX*)-^XnV 



Let us for example expand the upper left entry of r„^i 1 of r„. We have 

1 . Zn 



^n'^z'^I -X X* 



1 



u 



n —u 



u Tr ■ 



r, z"^! —X X* V z"^! —XX* 



+y/n ( — Tr 



Z'^I — X X* 



n{ip^^{zn) - iPf,j,ip)) (28) 
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The third term of the RHS of (l28l) tends to x(f'^^{p) as n — > oo. By Taylor-Lagrange 
Formula, there is ^„ G (0, 1) such that the second one is equal to 



1 Z 



hence tends to zero, by Assumptions I2.ll and 12.191 To sum up, we have 



rn.,1.1 



n I —u 



1 



— u — — Tr — 

77 ^ z^l — X X* r? z^l —XX* 



+ xif'^Jp) + 0(1) (29) 



In the same way, we have 

1 . 



n.2,2 



m z^I — X*X 



Tr 



^2 r Y* Y 



(p) + o(l)(30) 



Then the "^4(1^) = 0" case of Theorem 6.4 of [TS] allows to conclude. □ 
Let us now complete the proof of Theorem 12.181 By the previous lemma, we have 

detM.(p+^) = 

det ( ['^^^ ^P'^ ~^ M + ^ [^'^^^ ^^^"^ 1 ^ 

for some random variables with converging in distribution to the random vari- 

ables X, y, Z of the previous lemma. Using the relation ip^^{p)(pj2^{p) = 9'"^, we get 

det M„(p+^) 

= + ^ {2x9~^ + ^^^.(p)c2K„ + ipjiMciXn + e-^diZn + Z;)} + O (i) 
Thus by (^^, we have 



lim F{^/n{al - p) < x} = lim P{det M„ ( p + ^ ) > 0} 

= P{-^(^^,(p)c2F + y.^,(p)ciX + rt(Z + Z)) <x}. 

It follows that the distribution of ^/n{al — p) converges weakly to the one of sX, for X a 
standard Gaussian random variable on M and 

S' = J {{VjlxiP)cir + iV,xxip)^2? + ^e-'dy) . 

One can easily recover the formula given in Theorem 12.181 for s^, using the relation 



8. Appendix 



We now state the continuity lemma that we use in the proof of Theorem 12.81 We note 
that nothing in its hypotheses is random. As hinted earlier, we will invoke it to localize 
the extreme eigenvalues of X„. 

Lemma 8.1. We suppose the positive real numbers 61, . . . ,6r to be pairwise distinct. Let 
us fix a real number < b and two analytic functions ipi , ip2 defined on {z E C s.t. 3? (z) > 
0}\[0, b] such that for all i = 1,2, 
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a) ipi{z) e M z eR, 

b) for all z > b, fi{z) < 0, 

c) (pi{z) — y as \z\ — > oo. 

Let us define the 2r x 2r -matrix-valued function 
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M{z) :-- 



■ e^^ 
e-i 









ip2{z)Ir 



and denote by zi > ■ ■ ■ > Zp the z 's in (6, oo) such that M{z) is not invertible, where 
p e {0, . . . , r} is the number of 6i 's such that 



lim^i{z)^2{z) > ^. 



Let us also consider a sequence < 6„ with limit b and, for each n, a 2r x 2r -matrix-valued 
function M„(-), defined on 

{zeC s.t. ^{z) > 0}\[0, bn], 
which coefficient are analytic functions, such that 

d) for all z ^M., Mn{z) is invertible, 

e) for all rj > 0, M„(-) converges to the function M {■) uniformly on {z E C s.t. ^{z) > 
b + ri}. 



Then 



(i) there exists p real sequences Zn,i > ■ ■ ■ > Zn,p converging respectively to Zi, . . . , Zp 
such that for any e > small enough, for n large enough, the z 's in {b + e, oo) 
such that Mn{z) is not invertible are exactly Zn,i, 



(ii) for n large enough, for each i, Mn{zn^i) has rank 2r — 1. 
Proof. To prove this lemma, we use the formula 



det 



diag(ai 



diag(ai, . . . , 

yir 



Y[{xy - a'^) 



i=l 



□ 



in the appropriate place and proceed as the proof of Lemma 6.1 in [T7] . 

We also need the following proposition. The UiS and the Wj's are the random column 
vectors introduced in Section I2TT1 

Proposition 8.2. Let, for eachn. An, Bn be complex nxn, nxm matrices which operator 
norms, with respect to the canonical Hermitian structure, are bounded independently of 
n. Then for any rj > 0, there exists C,a > such that for all n, for all i, j, e {1, . . . , r} 
such that i ^ j , 

1 

W{\{u,,AnUi) - -Tr(A„)| >ri or |(ui,A„Uj)| > 7] or > r]} < Ce"" . 
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Proof. In the i.i.d. model, this result is an obvious consequence of [151 Prop. 6.2]. In 
the orthonormalized model, one also has to use [151 Prop. 6.2], which states that the Wj's 
(the same holds for the Wj's) are obtained from the n x r matrix G^'^ with i.i.d. entries 
distributed according to u by the following formula: for alH = 1, . . . , r, 

ith column of gI"^ x (lyW)^ 

Ui - 



llith column of x {W^'^^fy 

where W^^'^ is a (random) r x r matrix such that for certain positive constants D, c, k, for 
all e > and all n, 



> e or max 

l<i<r 



^lUth column of G^") x (W^^'Yl 
/n 



>s}< /^(e-'^^'+e- 
□ 
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